AITopics | cognate class

Collaborating Authors

cognate class

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond cognacy

Jäger, Gerhard

arXiv.org Artificial IntelligenceJul-8-2025

Computational phylogenetics has become an established tool in historical linguistics, with many language families now analyzed using likelihood-based inference. However, standard approaches rely on expert-annotated cognate sets, which are sparse, labor-intensive to produce, and limited to individual language families. This paper explores alternatives by comparing the established method to two fully automated methods that extract phylogenetic signal directly from lexical data. One uses automatic cognate clustering with unigram/concept features; the other applies multiple sequence alignment (MSA) derived from a pair-hidden Markov model. Both are evaluated against expert classifications from Glottolog and typological data from Grambank. Also, the intrinsic strengths of the phylogenetic signal in the characters are compared. Results show that MSA-based inference yields trees more consistent with linguistic classifications, better predicts typological variation, and provides a clearer phylogenetic signal, suggesting it as a promising, scalable alternative to traditional cognate-based methods. This opens new avenues for global-scale language phylogenies beyond expert annotation bottlenecks.

artificial intelligence, language family, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.03005

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.40)
North America > United States (0.14)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.89)

Add feedback

Computational Approaches for Integrating out Subjectivity in Cognate Synonym Selection

Häuser, Luise, Jäger, Gerhard, Stamatakis, Alexandros

arXiv.org Artificial IntelligenceJun-5-2024

Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we address the question how one can and if one should include all synonyms or whether it is preferable to select synonyms a priori. To this end, we perform maximum likelihood tree inferences with the widely used RAxML-NG tool and show that it yields plausible trees when all synonyms are used as input. Furthermore, we show that a priori synonym selection can yield topologically substantially different trees and we therefore advise against doing so. To represent cognate data including all synonyms, we introduce two types of character matrices beyond the standard binary ones: probabilistic binary and probabilistic multi-valued character matrices. We further show that it is dataset-dependent for which character matrix type the inferred RAxML-NG tree is topologically closest to the gold standard. We also make available a Python interface for generating all of the above character matrix types for cognate data provided in CLDF format.

character matrix, dataset, matrix, (14 more...)

arXiv.org Artificial Intelligence

2404.19328

Country:

North America > United States (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.05)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Multiple evolutionary pressures shape identical consonant avoidance in the world's languages

Cathcart, Chundra A.

arXiv.org Artificial IntelligenceOct-2-2023

Languages disfavor word forms containing sequences of similar or identical consonants, due to the biomechanical and cognitive difficulties posed by patterns of this sort. However, the specific evolutionary processes responsible for this phenomenon are not fully understood. Words containing sequences of identical consonants may be more likely to arise than those without; processes of word form mutation may be more likely to remove than create sequences of identical consonants in word forms; finally, words containing identical consonants may die out more frequently than those without. Phylogenetic analyses of the evolution of homologous word forms indicate that words with identical consonants arise less frequently than those without, and processes which mutate word forms are more likely to remove sequences of identical consonants than introduce them. However, words with identical consonants do not die out more frequently than those without. Further analyses reveal that forms with identical consonants are replaced in basic meaning functions more frequently than words without. Taken together, results suggest that the under representation of sequences of identical consonants is overwhelmingly a byproduct of constraints on word form coinage, though processes related to word usage also serve to ensure that such patterns are infrequent in more salient vocabulary items. These findings clarify previously unknown aspects of processes of lexical evolution and competition that take place during language change, optimizing communicative systems.

cognate class, sequence, word form, (15 more...)

arXiv.org Artificial Intelligence

2309.14006

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(13 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback